This notebook analyzes Seattle AirBNB Data

Include References

In [50]:
import datetime
import pandas as pd
import numpy as np
import folium
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as col
import matplotlib.dates as mdates
import seaborn as sns
import os
from sklearn.cluster import KMeans
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from time import time

%matplotlib inline 
In [2]:
processed_path = "../data/processed/"

Load Data

Listings

In [3]:
seattle_data_listings_df = pd.read_csv("../data/seattle/listings.csv")
In [4]:
seattle_data_listings_df.head()
Out[4]:
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview ... review_scores_value requires_license license jurisdiction_names instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month
0 241032 https://www.airbnb.com/rooms/241032 20160104002432 2016-01-04 Stylish Queen Anne Apartment NaN Make your self at home in this charming one-be... Make your self at home in this charming one-be... none NaN ... 10.0 f NaN WASHINGTON f moderate f f 2 4.07
1 953595 https://www.airbnb.com/rooms/953595 20160104002432 2016-01-04 Bright & Airy Queen Anne Apartment Chemically sensitive? We've removed the irrita... Beautiful, hypoallergenic apartment in an extr... Chemically sensitive? We've removed the irrita... none Queen Anne is a wonderful, truly functional vi... ... 10.0 f NaN WASHINGTON f strict t t 6 1.48
2 3308979 https://www.airbnb.com/rooms/3308979 20160104002432 2016-01-04 New Modern House-Amazing water view New modern house built in 2013. Spectacular s... Our house is modern, light and fresh with a wa... New modern house built in 2013. Spectacular s... none Upper Queen Anne is a charming neighborhood fu... ... 10.0 f NaN WASHINGTON f strict f f 2 1.15
3 7421966 https://www.airbnb.com/rooms/7421966 20160104002432 2016-01-04 Queen Anne Chateau A charming apartment that sits atop Queen Anne... NaN A charming apartment that sits atop Queen Anne... none NaN ... NaN f NaN WASHINGTON f flexible f f 1 NaN
4 278830 https://www.airbnb.com/rooms/278830 20160104002432 2016-01-04 Charming craftsman 3 bdm house Cozy family craftman house in beautiful neighb... Cozy family craftman house in beautiful neighb... Cozy family craftman house in beautiful neighb... none We are in the beautiful neighborhood of Queen ... ... 9.0 f NaN WASHINGTON f strict f f 1 0.89

5 rows × 92 columns

In [5]:
seattle_data_listings_df.shape
Out[5]:
(3818, 92)
In [6]:
seattle_data_listings_df.columns
Out[6]:
Index(['id', 'listing_url', 'scrape_id', 'last_scraped', 'name', 'summary',
       'space', 'description', 'experiences_offered', 'neighborhood_overview',
       'notes', 'transit', 'thumbnail_url', 'medium_url', 'picture_url',
       'xl_picture_url', 'host_id', 'host_url', 'host_name', 'host_since',
       'host_location', 'host_about', 'host_response_time',
       'host_response_rate', 'host_acceptance_rate', 'host_is_superhost',
       'host_thumbnail_url', 'host_picture_url', 'host_neighbourhood',
       'host_listings_count', 'host_total_listings_count',
       'host_verifications', 'host_has_profile_pic', 'host_identity_verified',
       'street', 'neighbourhood', 'neighbourhood_cleansed',
       'neighbourhood_group_cleansed', 'city', 'state', 'zipcode', 'market',
       'smart_location', 'country_code', 'country', 'latitude', 'longitude',
       'is_location_exact', 'property_type', 'room_type', 'accommodates',
       'bathrooms', 'bedrooms', 'beds', 'bed_type', 'amenities', 'square_feet',
       'price', 'weekly_price', 'monthly_price', 'security_deposit',
       'cleaning_fee', 'guests_included', 'extra_people', 'minimum_nights',
       'maximum_nights', 'calendar_updated', 'has_availability',
       'availability_30', 'availability_60', 'availability_90',
       'availability_365', 'calendar_last_scraped', 'number_of_reviews',
       'first_review', 'last_review', 'review_scores_rating',
       'review_scores_accuracy', 'review_scores_cleanliness',
       'review_scores_checkin', 'review_scores_communication',
       'review_scores_location', 'review_scores_value', 'requires_license',
       'license', 'jurisdiction_names', 'instant_bookable',
       'cancellation_policy', 'require_guest_profile_picture',
       'require_guest_phone_verification', 'calculated_host_listings_count',
       'reviews_per_month'],
      dtype='object')
In [7]:
seattle_data_listings_df['host_since_in_years'] = (datetime.datetime.now()-pd.to_datetime(seattle_data_listings_df['host_since'])).astype('timedelta64[Y]')
In [8]:
seattle_data_listings_df
Out[8]:
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview ... requires_license license jurisdiction_names instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month host_since_in_years
0 241032 https://www.airbnb.com/rooms/241032 20160104002432 2016-01-04 Stylish Queen Anne Apartment NaN Make your self at home in this charming one-be... Make your self at home in this charming one-be... none NaN ... f NaN WASHINGTON f moderate f f 2 4.07 7.0
1 953595 https://www.airbnb.com/rooms/953595 20160104002432 2016-01-04 Bright & Airy Queen Anne Apartment Chemically sensitive? We've removed the irrita... Beautiful, hypoallergenic apartment in an extr... Chemically sensitive? We've removed the irrita... none Queen Anne is a wonderful, truly functional vi... ... f NaN WASHINGTON f strict t t 6 1.48 5.0
2 3308979 https://www.airbnb.com/rooms/3308979 20160104002432 2016-01-04 New Modern House-Amazing water view New modern house built in 2013. Spectacular s... Our house is modern, light and fresh with a wa... New modern house built in 2013. Spectacular s... none Upper Queen Anne is a charming neighborhood fu... ... f NaN WASHINGTON f strict f f 2 1.15 4.0
3 7421966 https://www.airbnb.com/rooms/7421966 20160104002432 2016-01-04 Queen Anne Chateau A charming apartment that sits atop Queen Anne... NaN A charming apartment that sits atop Queen Anne... none NaN ... f NaN WASHINGTON f flexible f f 1 NaN 4.0
4 278830 https://www.airbnb.com/rooms/278830 20160104002432 2016-01-04 Charming craftsman 3 bdm house Cozy family craftman house in beautiful neighb... Cozy family craftman house in beautiful neighb... Cozy family craftman house in beautiful neighb... none We are in the beautiful neighborhood of Queen ... ... f NaN WASHINGTON f strict f f 1 0.89 6.0
5 5956968 https://www.airbnb.com/rooms/5956968 20160104002432 2016-01-04 Private unit in a 1920s mansion We're renting out a small private unit of one ... If you include a bit of your background in you... We're renting out a small private unit of one ... none This part of Queen Anne has wonderful views an... ... f NaN WASHINGTON f strict f f 1 2.45 7.0
6 1909058 https://www.airbnb.com/rooms/1909058 20160104002432 2016-01-04 Queen Anne Private Bed and Bath Enjoy a quiet stay in our comfortable 1915 Cra... Enjoy a quiet stay in our comfortable 1915 Cra... Enjoy a quiet stay in our comfortable 1915 Cra... none Close restaurants, coffee shops and grocery st... ... f NaN WASHINGTON f moderate f f 1 2.46 6.0
7 856550 https://www.airbnb.com/rooms/856550 20160104002432 2016-01-04 Tiny Garden cabin on Queen Anne Our tiny cabin is private , very quiet and com... This cabin was built with Airbnb in mind, Que... Our tiny cabin is private , very quiet and com... none We are centrally located between Downtown and ... ... f NaN WASHINGTON f strict t t 5 4.73 5.0
8 4948745 https://www.airbnb.com/rooms/4948745 20160104002432 2016-01-04 Urban Charm || Downtown || Views Nestled in the heart of the city, this space i... Located in the heart of the city, this space i... Nestled in the heart of the city, this space i... none Walking Score: 92 4 blocks from Kerry Park Fam... ... f NaN WASHINGTON f strict f f 1 1.22 6.0
9 2493658 https://www.airbnb.com/rooms/2493658 20160104002432 2016-01-04 Airy + Bright Queen Anne Apartment Beautiful apartment in an extremely safe, quie... What's special about this place? A beautiful r... Beautiful apartment in an extremely safe, quie... none Queen Anne is a wonderful, truly functional vi... ... f NaN WASHINGTON f strict t t 6 1.55 5.0
10 175576 https://www.airbnb.com/rooms/175576 20160104002432 2016-01-04 Private Apartment - Queen Anne Hill Queen Anne Hill is a charming neighborhood wit... Be close to everything! Queen Anne Hill is a ... Queen Anne Hill is a charming neighborhood wit... none Queen Anne Hill is a wonderful and historic ar... ... f NaN WASHINGTON f moderate t f 1 3.33 7.0
11 4454295 https://www.airbnb.com/rooms/4454295 20160104002432 2016-01-04 Upper Queen Anne Craftsman House Beautifully furnished, cozy 1 bedroom mid cent... Beautiful home in an extremely walkable neighb... Beautifully furnished, cozy 1 bedroom mid cent... none I am located in the Upper Queen Anne neighborh... ... f NaN WASHINGTON f strict f f 1 0.98 5.0
12 3883392 https://www.airbnb.com/rooms/3883392 20160104002432 2016-01-04 Open Plan 2bdr/1bath in Queen Anne Spacious apt in popular Seattle neighborhood. ... This apartment is in a quiet and friendly city... Spacious apt in popular Seattle neighborhood. ... none This neighborhood is one of Seattle's popular ... ... f NaN WASHINGTON f moderate f f 1 0.92 4.0
13 8889257 https://www.airbnb.com/rooms/8889257 20160104002432 2016-01-04 Elegance in Historic Queen Anne Enjoy our amazing, updated & modern design cot... Originally built in 1906, our house has a ligh... Enjoy our amazing, updated & modern design cot... none Queen Anne hill became a popular spot for the ... ... f NaN WASHINGTON f strict f f 1 3.00 3.0
14 5680462 https://www.airbnb.com/rooms/5680462 20160104002432 2016-01-04 Stunning 6 bd in THE BEST Location! Stunning Designsponge featured 6 bed, 3.75 bat... Gorgeous, LIGHT FILLED, Newly Constructed Mode... Stunning Designsponge featured 6 bed, 3.75 bat... none Queen Anne is THE BEST and most desirable neig... ... f NaN WASHINGTON t strict f f 1 2.65 3.0
15 8988178 https://www.airbnb.com/rooms/8988178 20160104002432 2016-01-04 Lovely Queen Anne Cottage, 2 BR This home is full of light, art and comfort. 5... The Space This is a 1000 square foot, two bedr... This home is full of light, art and comfort. 5... none Queen Anne is a charming and very safe neighbo... ... f NaN WASHINGTON f strict f f 1 0.73 6.0
16 3245876 https://www.airbnb.com/rooms/3245876 20160104002432 2016-01-04 Park Life in Lower Queen Anne Master bedroom suite with 1/4 bath & kitchenet... **PLEASE MAKE SURE TO READ ALL INFO BEFORE BOO... Master bedroom suite with 1/4 bath & kitchenet... none Lower Queen Anne is amazing - you can walk to ... ... f NaN WASHINGTON f moderate f f 1 4.55 4.0
17 4933447 https://www.airbnb.com/rooms/4933447 20160104002432 2016-01-04 Private Garden Suite, Bay View Beautiful private entrance garden suite overlo... French Country style home built in 1939 with s... Beautiful private entrance garden suite overlo... none We are located in Upper Queen Anne in walking ... ... f NaN WASHINGTON t moderate f f 1 4.58 3.0
18 7735464 https://www.airbnb.com/rooms/7735464 20160104002432 2016-01-04 Queen Anne Getaway Near Seattle! The second room in our spacious 2BR / 2 Bath a... NaN The second room in our spacious 2BR / 2 Bath a... none NaN ... f NaN WASHINGTON f flexible f f 1 NaN 5.0
19 6291829 https://www.airbnb.com/rooms/6291829 20160104002432 2016-01-04 Grand Craftsman Home on Queen Anne This home built in 1909. It has 5 bedrooms a... This home encompasses the character that is so... This home built in 1909. It has 5 bedrooms a... none Upper Queen Anne is a charming neighborhood wi... ... f NaN WASHINGTON f flexible f f 1 0.82 3.0
20 9218403 https://www.airbnb.com/rooms/9218403 20160104002432 2016-01-04 Queen Anne View One Bedroom This clean and comfortable one bedroom sits ri... Kitchen has hot water tap and sodastream Excel... This clean and comfortable one bedroom sits ri... none Lower Queen Anne is near the Seattle Center (s... ... f NaN WASHINGTON f flexible f f 1 1.00 4.0
21 4125779 https://www.airbnb.com/rooms/4125779 20160104002432 2016-01-04 Cozy Queen Anne Finished Basement Relax in your own private finished basement sp... Updated daylight basement space with queen and... Relax in your own private finished basement sp... none Just a few minutes walk to many great restaura... ... f NaN WASHINGTON f moderate f f 1 0.71 3.0
22 8942678 https://www.airbnb.com/rooms/8942678 20160104002432 2016-01-04 Lovely Queen Anne home Welcome to Seattle! Enjoy your stay in a turn... Centrally located spacious home in the heart o... Welcome to Seattle! Enjoy your stay in a turn... none Queen Anne is a wonderful mix of beautiful tre... ... f NaN WASHINGTON f flexible f f 1 0.86 2.0
23 10106055 https://www.airbnb.com/rooms/10106055 20160104002432 2016-01-04 Cozy Lower Level, Upper Queen Anne Greetings! Our home is a beautiful, 1920's 4 ... The space we are renting is the entire lower l... Greetings! Our home is a beautiful, 1920's 4 ... none Queen Anne is a lovely and venerable Seattle n... ... f NaN WASHINGTON f flexible f f 1 NaN 2.0
24 6362362 https://www.airbnb.com/rooms/6362362 20160104002432 2016-01-04 Charming home on Queen Anne Enjoy Seattle from the ideally located Queen A... Step into the main living space with sitting a... Enjoy Seattle from the ideally located Queen A... none NaN ... f NaN WASHINGTON f moderate f f 1 0.18 3.0
25 3544550 https://www.airbnb.com/rooms/3544550 20160104002432 2016-01-04 VIEW-Monthly Rental Available Our home is very light and full of character a... Our home is a brick tudor/cottage built in 193... Our home is very light and full of character a... none Upper Queen Anne is a charming neighborhood fu... ... f NaN WASHINGTON f strict f f 2 1.30 4.0
26 9025039 https://www.airbnb.com/rooms/9025039 20160104002432 2016-01-04 Cute Bungalow-Close to Everything! Our cozy little bungalow is the perfect place ... One bedroom has a queen sized bed, and the oth... Our cozy little bungalow is the perfect place ... none The Queen Anne neighborhood is one of the most... ... f NaN WASHINGTON f flexible f f 1 NaN 8.0
27 3200646 https://www.airbnb.com/rooms/3200646 20160104002432 2016-01-04 Micro Gypsy Wagon on Queen Anne Micro camper with queen size bed and seating f... The Spruce Kaboose is a hand built one of a ki... Micro camper with queen size bed and seating f... none Interbay is right between downtown Seattle and... ... f NaN WASHINGTON f strict t t 5 4.38 5.0
28 8859380 https://www.airbnb.com/rooms/8859380 20160104002432 2016-01-04 STUNNING VIEW of Puget Sound This three bedroom flat is your gorgeous retre... You won't want to leave your home away from ho... This three bedroom flat is your gorgeous retre... none Located in the beautiful neighborhood of Queen... ... f NaN WASHINGTON t moderate f f 1 2.47 5.0
29 4520099 https://www.airbnb.com/rooms/4520099 20160104002432 2016-01-04 Open Airy Queen Anne Condo Designer home situated on Queen Anne Hill over... During the summer, there was construction on t... Designer home situated on Queen Anne Hill over... none This is a really quiet neighborhood with plent... ... f NaN WASHINGTON f strict f f 1 0.33 5.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3788 7745196 https://www.airbnb.com/rooms/7745196 20160104002432 2016-01-04 Cozy Modern Apartment in Fremont Our apartment is bright and airy and a modern ... Our apartment is brand new construction with t... Our apartment is bright and airy and a modern ... none The Fremont neighborhood of Seattle is known a... ... f NaN WASHINGTON f strict f f 1 3.39 7.0
3789 4645405 https://www.airbnb.com/rooms/4645405 20160104002432 2016-01-04 Private light, minimalist room The extremely comfortable sleep number Califor... Vintage, modern 1910 craftsman with dark expos... The extremely comfortable sleep number Califor... none I love that I can walk 15 minutes into Fremont... ... f NaN WASHINGTON f flexible f f 2 2.14 4.0
3790 2755730 https://www.airbnb.com/rooms/2755730 20160104002432 2016-01-04 Your own 2 Bedroom home in Fremont Private 2 bedroom retreat in the heart of Frem... The house was built in 1906 by the Seattle Lum... Private 2 bedroom retreat in the heart of Frem... none Very walkable, with lots of bike lanes and qui... ... f NaN WASHINGTON t strict f f 1 3.77 5.0
3791 7949448 https://www.airbnb.com/rooms/7949448 20160104002432 2016-01-04 Private Suite in Fremont-Ballard This is a private suite with bedroom and bathr... This is a ground floor bedroom in our townhous... This is a private suite with bedroom and bathr... none Our house is centrally located between Ballard... ... f NaN WASHINGTON f moderate f f 1 2.48 3.0
3792 9288840 https://www.airbnb.com/rooms/9288840 20160104002432 2016-01-04 City Haven Vacationing in Seattle? Traveling for business... NaN Vacationing in Seattle? Traveling for business... none NaN ... f NaN WASHINGTON f flexible f f 1 NaN 4.0
3793 3312406 https://www.airbnb.com/rooms/3312406 20160104002432 2016-01-04 Seattle Urban-Chic Studio Cottage Come stay at this 2014 Northwest Green Home To... Private, just-built, high-end studio loft in a... Come stay at this 2014 Northwest Green Home To... none Renowned restaurants, coffeshops and bars, the... ... f NaN WASHINGTON t strict f f 1 4.28 4.0
3794 6621924 https://www.airbnb.com/rooms/6621924 20160104002432 2016-01-04 Hazel Heights Hideout An one-bedroom apartment in the top level of m... Enter via the big back deck which has views of... An one-bedroom apartment in the top level of m... none Fremont - The Center Of The Universe. This ne... ... f NaN WASHINGTON f flexible f f 1 3.10 3.0
3795 5673552 https://www.airbnb.com/rooms/5673552 20160104002432 2016-01-04 Beautiful loft in downtown Fremont Our cool loft apartment offers a convenient lo... This historic building houses a converted stor... Our cool loft apartment offers a convenient lo... none Fremont is a unique, young neighborhood filled... ... f NaN WASHINGTON f flexible f f 1 1.18 3.0
3796 609701 https://www.airbnb.com/rooms/609701 20160104002432 2016-01-04 Charming Fremont Garden Cottage Enjoy the excitement of Seattle and city life ... Charming and pristine pied-a-terre in the mids... Enjoy the excitement of Seattle and city life ... none FREMONT is such a fun neighborhood with lot's ... ... f NaN WASHINGTON t moderate f f 1 5.57 6.0
3797 10118341 https://www.airbnb.com/rooms/10118341 20160104002432 2016-01-04 Fremont Lighthouse Mother-in-Law A clean and simple mother-in-law across the st... A clean and simple mother-in-law basement stud... A clean and simple mother-in-law across the st... none Located at the top of Fremont neighborhood/up ... ... f NaN WASHINGTON f strict f f 1 1.00 6.0
3798 2614387 https://www.airbnb.com/rooms/2614387 20160104002432 2016-01-04 Everything Seattle in 0-3 miles, A Spacious 1 bedroom: private 3/4 bathroom, Quee... New, modern, clean, quiet and no clutter, yet ... Spacious 1 bedroom: private 3/4 bathroom, Quee... none I love everything about my neighborhood, Seatt... ... f NaN WASHINGTON f strict f f 2 5.95 7.0
3799 7735100 https://www.airbnb.com/rooms/7735100 20160104002432 2016-01-04 Master bedroom in Fremont The townhouse is conveniently blocks away from... NaN The townhouse is conveniently blocks away from... none NaN ... f NaN WASHINGTON f moderate f f 1 1.06 5.0
3800 5482204 https://www.airbnb.com/rooms/5482204 20160104002432 2016-01-04 Six room suite Seattle's most elegant and romantic lit... Each suite is 900 square feet of luxury and el... Seattle's most elegant and romantic lit... none Chelsea Station Inn is walking distance to the... ... f NaN WASHINGTON f strict f f 1 NaN 3.0
3801 4524575 https://www.airbnb.com/rooms/4524575 20160104002432 2016-01-04 Comfortable Fremont Apartment This lovely apartment is located on the lower ... Our recently renovated apartment is perfect fo... This lovely apartment is located on the lower ... none Fremont is centrally located within the city a... ... f NaN WASHINGTON f moderate f f 1 2.05 4.0
3802 8562314 https://www.airbnb.com/rooms/8562314 20160104002432 2016-01-04 Cozy Craftsman in Heart of Fremont Our rare for its location, 2 bedroom and 1 bat... In a land ruled by condos, it's hard to find a... Our rare for its location, 2 bedroom and 1 bat... none I loved living here before we had 3 kids. The ... ... f NaN WASHINGTON f flexible f f 1 NaN 4.0
3803 9698202 https://www.airbnb.com/rooms/9698202 20160104002432 2016-01-04 Clean City House, Lots of Beds Get anywhere in Seattle quickly from this grea... The space is well suited for large groups and ... Get anywhere in Seattle quickly from this grea... none This house is central to any neighborhood you'... ... f NaN WASHINGTON f moderate f f 8 4.00 5.0
3804 7178490 https://www.airbnb.com/rooms/7178490 20160104002432 2016-01-04 Cedar House Studio Suite in Fremont Comfortable and clean, this lower level suite ... A lush, private garden path leads to the entra... Comfortable and clean, this lower level suite ... none Upper Fremont is a vibrant neighborhood that o... ... f NaN WASHINGTON f strict f f 1 2.34 7.0
3805 8054902 https://www.airbnb.com/rooms/8054902 20160104002432 2016-01-04 2 BR/1 BA Fremont Apt w/ parking This 2 bedroom, 1 bath garden-level apartment ... The Space This 2 bedroom, 1 bath garden-level ... This 2 bedroom, 1 bath garden-level apartment ... none One of the most fun neighborhoods in Seattle. ... ... f NaN WASHINGTON f strict f f 1 3.95 5.0
3806 5458027 https://www.airbnb.com/rooms/5458027 20160104002432 2016-01-04 Sunny Charm in Urban Cottage 2BD Charming urban home with rustic cottage feel. ... NaN Charming urban home with rustic cottage feel. ... none NaN ... f NaN WASHINGTON f flexible f f 2 0.24 5.0
3807 4940491 https://www.airbnb.com/rooms/4940491 20160104002432 2016-01-04 Roof Deck in Fremont/Wallingford Adorable private studio nestled in one of Seat... The studio is filled with beautiful original a... Adorable private studio nestled in one of Seat... none From the apartment it is an easy walk to the h... ... f NaN WASHINGTON f strict f f 1 0.78 7.0
3808 1844791 https://www.airbnb.com/rooms/1844791 20160104002432 2016-01-04 Beautiful Craftsman - Fremont 3 Bed Our charming home in fabulous Upper Fremont is... This is a classically beautiful Craftsman home... Our charming home in fabulous Upper Fremont is... none Fremont is wonderful; you will love it here. T... ... f NaN WASHINGTON f strict f f 2 1.15 5.0
3809 6120046 https://www.airbnb.com/rooms/6120046 20160104002432 2016-01-04 Lake Veiw Cottage in Fremont From the deck of this quaint little apartment ... This one bedroom apartment (Hippy Shack) is de... From the deck of this quaint little apartment ... none This cottage apartment is located just a few b... ... f NaN WASHINGTON f strict f f 1 1.18 3.0
3810 262764 https://www.airbnb.com/rooms/262764 20160104002432 2016-01-04 Fremont Farmhouse Our 2BR/1 bath home in Fremont's most fun neig... 1200 ft2 2BR, 1 Bath farmhouse with a large, t... Our 2BR/1 bath home in Fremont's most fun neig... none We love that Fremont is so centrally located t... ... f NaN WASHINGTON f strict f f 1 1.56 6.0
3811 8578490 https://www.airbnb.com/rooms/8578490 20160104002432 2016-01-04 Super Convenient Top Floor Apt In the true spirit of AirBNB this unit is avai... The building is older (1970's) so not the mode... In the true spirit of AirBNB this unit is avai... none I am equidistant to Fremont and Wallingford an... ... f NaN WASHINGTON f moderate f f 1 0.63 4.0
3812 3383329 https://www.airbnb.com/rooms/3383329 20160104002432 2016-01-04 OF THE TREE & CLOUDS. KID FRIENDLY! Of the Tree & Clouds' "Roots" apartment is 8 b... The 1-bedroom, ground-floor Roots apartment is... Of the Tree & Clouds' "Roots" apartment is 8 b... none It's taken me a while to figure out why Fremon... ... f NaN WASHINGTON t moderate t t 3 4.01 6.0
3813 8101950 https://www.airbnb.com/rooms/8101950 20160104002432 2016-01-04 3BR Mountain View House in Seattle Our 3BR/2BA house boasts incredible views of t... Our 3BR/2BA house bright, stylish, and wheelch... Our 3BR/2BA house boasts incredible views of t... none We're located near lots of family fun. Woodlan... ... f NaN WASHINGTON f strict f f 8 0.30 3.0
3814 8902327 https://www.airbnb.com/rooms/8902327 20160104002432 2016-01-04 Portage Bay View!-One Bedroom Apt 800 square foot 1 bedroom basement apartment w... This space has a great view of Portage Bay wit... 800 square foot 1 bedroom basement apartment w... none The neighborhood is a quiet oasis that is clos... ... f NaN WASHINGTON f moderate f f 1 2.00 2.0
3815 10267360 https://www.airbnb.com/rooms/10267360 20160104002432 2016-01-04 Private apartment view of Lake WA Very comfortable lower unit. Quiet, charming m... NaN Very comfortable lower unit. Quiet, charming m... none NaN ... f NaN WASHINGTON f moderate f f 1 NaN 2.0
3816 9604740 https://www.airbnb.com/rooms/9604740 20160104002432 2016-01-04 Amazing View with Modern Comfort! Cozy studio condo in the heart on Madison Park... Fully furnished unit to accommodate most needs... Cozy studio condo in the heart on Madison Park... none Madison Park offers a peaceful slow pace upsca... ... f NaN WASHINGTON f moderate f f 1 NaN 3.0
3817 10208623 https://www.airbnb.com/rooms/10208623 20160104002432 2016-01-04 Large Lakefront Apartment All hardwood floors, fireplace, 65" TV with Xb... NaN All hardwood floors, fireplace, 65" TV with Xb... none NaN ... f NaN WASHINGTON f flexible f f 1 NaN 4.0

3818 rows × 93 columns

Data Preparation:

Replace % and $ sign in the below fields

In [9]:
seattle_data_listings_df["host_response_rate"] = seattle_data_listings_df["host_response_rate"].replace('[\%,]', '', regex=True).astype(float)
In [10]:
seattle_data_listings_df["price"] = seattle_data_listings_df["price"].replace('[\$,]', '', regex=True).astype(float)

Data Preparation : Removing Constants

In [11]:
seattle_data_listings_df = seattle_data_listings_df.loc[:,seattle_data_listings_df.apply(pd.Series.nunique) > 1]

Data Understanding: Understand the percentiles of price and availability fields

In [12]:
ax = sns.boxplot(seattle_data_listings_df["price"])
plt.title("Box plot - Price Field")
plt.show()
In [13]:
def print_percentiles(field):
    """
    prints percentiles of a field
    field : variable name
    returns : None
    """
    print(("{0}th percentile for price field is {1}").format(5, np.percentile(field,5)))
    print(("{0}th percentile for price field is {1}").format(25, np.percentile(field,25)))
    print(("{0}th percentile for price field is {1}").format(50, np.percentile(field,50)))
    print(("{0}th percentile for price field is {1}").format(75, np.percentile(field,75)))
    print(("{0}th percentile for price field is {1}").format(95, np.percentile(field,95)))
In [14]:
print_percentiles(seattle_data_listings_df["price"])
5th percentile for price field is 45.0
25th percentile for price field is 75.0
50th percentile for price field is 100.0
75th percentile for price field is 150.0
95th percentile for price field is 299.0
In [15]:
ax = sns.boxplot(seattle_data_listings_df["availability_365"])
plt.title("Box plot - Availability 365 Field")
plt.show()
In [16]:
print_percentiles(seattle_data_listings_df["availability_365"])
5th percentile for price field is 14.0
25th percentile for price field is 124.0
50th percentile for price field is 308.0
75th percentile for price field is 360.0
95th percentile for price field is 365.0

Load Calendar Data

In [17]:
seattle_data_calendar_df = pd.read_csv("../data/seattle/calendar.csv")
In [18]:
seattle_data_calendar_df.head()
Out[18]:
listing_id date available price
0 241032 2016-01-04 t $85.00
1 241032 2016-01-05 t $85.00
2 241032 2016-01-06 f NaN
3 241032 2016-01-07 f NaN
4 241032 2016-01-08 f NaN
In [19]:
seattle_data_calendar_df.columns
Out[19]:
Index(['listing_id', 'date', 'available', 'price'], dtype='object')
In [20]:
seattle_data_calendar_df["listing_id"].nunique()
Out[20]:
3818

Descriptive Statistics of the data

In [21]:
def get_stats(df, save_to_file_name):
    """
    Purpose: Provide statistics of fields in the dataframe
    df : dataframe
    save_to_file_name : save statistics results to file
    """
    summary_df = df.describe(include='all').T.reset_index()
    summary_df.to_csv(processed_path +save_to_file_name)
    return summary_df

Calendar Data : Descriptive Statistics

In [22]:
get_stats(seattle_data_calendar_df, "seattle_calendar_stats.csv").head(20)
Out[22]:
index count unique top freq mean std min 25% 50% 75% max
0 listing_id 1.39357e+06 NaN NaN NaN 5.55011e+06 2.96227e+06 3335 3.25821e+06 6.11824e+06 8.03521e+06 1.03402e+07
1 date 1393570 365 2016-05-04 3818 NaN NaN NaN NaN NaN NaN NaN
2 available 1393570 2 t 934542 NaN NaN NaN NaN NaN NaN NaN
3 price 934542 669 $150.00 36646 NaN NaN NaN NaN NaN NaN NaN

Listings Data : Descriptive Statistics

In [23]:
get_stats(seattle_data_listings_df, "seattle_listing_stats.csv").head(20)
Out[23]:
index count unique top freq mean std min 25% 50% 75% max
0 id 3818 NaN NaN NaN 5.55011e+06 2.96266e+06 3335 3.25826e+06 6.11824e+06 8.03513e+06 1.03402e+07
1 listing_url 3818 3818 https://www.airbnb.com/rooms/3543247 1 NaN NaN NaN NaN NaN NaN NaN
2 name 3818 3792 Capitol Hill Apartment 3 NaN NaN NaN NaN NaN NaN NaN
3 summary 3641 3478 This is a modern fully-furnished studio apartm... 15 NaN NaN NaN NaN NaN NaN NaN
4 space 3249 3119 *Note: This fall, there will be major renovati... 14 NaN NaN NaN NaN NaN NaN NaN
5 description 3818 3742 Our space is a mix of a hostel and a home. We ... 10 NaN NaN NaN NaN NaN NaN NaN
6 neighborhood_overview 2786 2506 Wallingford is a mostly-residential neighborho... 17 NaN NaN NaN NaN NaN NaN NaN
7 notes 2212 1999 All of our rentals are fully licensed and regu... 39 NaN NaN NaN NaN NaN NaN NaN
8 transit 2884 2574 Convenient public transportation. The location... 32 NaN NaN NaN NaN NaN NaN NaN
9 thumbnail_url 3498 3498 https://a2.muscache.com/ac/pictures/226cdb2c-a... 1 NaN NaN NaN NaN NaN NaN NaN
10 medium_url 3498 3498 https://a0.muscache.com/im/pictures/32674198/e... 1 NaN NaN NaN NaN NaN NaN NaN
11 picture_url 3818 3818 https://a2.muscache.com/ac/pictures/33456872/e... 1 NaN NaN NaN NaN NaN NaN NaN
12 xl_picture_url 3498 3498 https://a0.muscache.com/ac/pictures/87210986/1... 1 NaN NaN NaN NaN NaN NaN NaN
13 host_id 3818 NaN NaN NaN 1.57856e+07 1.45838e+07 4193 3.2752e+06 1.05581e+07 2.59031e+07 5.32086e+07
14 host_url 3818 2751 https://www.airbnb.com/users/show/8534462 46 NaN NaN NaN NaN NaN NaN NaN
15 host_name 3816 1466 Andrew 56 NaN NaN NaN NaN NaN NaN NaN
16 host_since 3816 1380 2013-08-30 51 NaN NaN NaN NaN NaN NaN NaN
17 host_location 3810 120 Seattle, Washington, United States 3259 NaN NaN NaN NaN NaN NaN NaN
18 host_about 2959 2011 It would be my pleasure to share and explore t... 46 NaN NaN NaN NaN NaN NaN NaN
19 host_response_time 3295 4 within an hour 1692 NaN NaN NaN NaN NaN NaN NaN

Business Question 1 : At Airbnb, we would want to know which months of the year are busier in Seattle than others?

Approach: From the calendar data, get all the available listing dates and price. Summing the price and grouping the data by month should mention how busy each month is.

Data Preparation

In [24]:
seattle_data_calendar_df["date"] = pd.to_datetime(seattle_data_calendar_df["date"])
seattle_data_calendar_df["price"] = seattle_data_calendar_df["price"].replace('[\$,]', '', regex=True).astype(float)

Dropping Missing Values

In [25]:
seattle_data_calendar_df["month"] = seattle_data_calendar_df["date"].dt.month
seattle_data_calendar_group_df = seattle_data_calendar_df.loc[:,["month","price"]]
seattle_data_calendar_group_df = seattle_data_calendar_group_df.dropna()
In [26]:
seattle_data_calendar_group_results_df = seattle_data_calendar_group_df.groupby(["month"]).sum()
In [27]:
seattle_data_calendar_group_results_df = seattle_data_calendar_group_results_df.reset_index()
In [28]:
month_dict = {"1":"Jan",
             "2":"Feb",
             "3":"Mar",
             "4":"Apr",
             "5":"May",
             "6":"Jun",
             "7":"Jul",
             "8":"Aug",
             "9":"Sep",
             "10":"Oct",
             "11":"Nov",
             "12":"Dec"}
seattle_data_calendar_group_results_df["month_text"] = seattle_data_calendar_group_results_df["month"].apply(lambda x: month_dict[str(x)])
In [29]:
seattle_data_calendar_group_results_df = seattle_data_calendar_group_results_df.sort_values(by=["price"], ascending=False)
seattle_data_calendar_group_results_df = seattle_data_calendar_group_results_df.reset_index(drop=True)
seattle_data_calendar_group_results_df
Out[29]:
month price month_text
0 12 11949282.0 Dec
1 8 11502179.0 Aug
2 6 11391415.0 Jun
3 10 11296639.0 Oct
4 7 11288732.0 Jul
5 5 11159008.0 May
6 11 11096625.0 Nov
7 9 11065949.0 Sep
8 3 10798161.0 Mar
9 4 10272371.0 Apr
10 2 9113355.0 Feb
11 1 7981548.0 Jan

Analysis

In [30]:
plt.figure(figsize=(8,6))
g = sns.barplot(x=seattle_data_calendar_group_results_df.index, y="price", data=seattle_data_calendar_group_results_df)
g.set(xticklabels=list(seattle_data_calendar_group_results_df["month_text"]))
plt.xlabel("Month")
plt.ylabel("Price")
plt.title("Total Listings Price per month")
plt.show()

Observation

Month of december seems to be the most busiest indicating winter holiday period around Xmas and new year. The month of august is next busiest indicating school holiday period in summer. Jan is least busier than all the other months.

Business Question 2 : Which neighbourhoods in Seattle provide the most revenue?

Approach: Listings have neighbourhood information.I've joined/combined listings and calendar data. Summing the price and grouping the data by neighbourhood helps one to determine how much revenue each neighbourhood is making.

In [31]:
seattle_data_neighbourhood = seattle_data_listings_df.loc[:,["id","neighbourhood"]].groupby(["id", "neighbourhood"]).count()
In [32]:
seattle_data_neighbourhood = seattle_data_neighbourhood.reset_index()
In [33]:
seattle_data_neighbourhood
Out[33]:
id neighbourhood
0 3335 Dunlap
1 4291 Roosevelt
2 5682 South Delridge
3 6606 Wallingford
4 7369 Broadway
5 9419 Georgetown
6 9460 First Hill
7 9531 The Junction
8 9534 The Junction
9 9596 Wallingford
10 10385 Maple Leaf
11 10695 Maple Leaf
12 11012 Wallingford
13 11411 Maple Leaf
14 13068 Capitol Hill
15 14386 Green Lake
16 15108 Green Lake
17 17951 The Junction
18 19611 Belltown
19 19619 Belltown
20 19623 Belltown
21 20868 Maple Leaf
22 20927 Ballard
23 20928 Ballard
24 23192 Capitol Hill
25 23356 Holly Park
26 23430 Belltown
27 23919 Windermere
28 24212 Belltown
29 25002 Ballard
... ... ...
3372 10192213 Minor
3373 10204689 Madrona
3374 10205366 Industrial District
3375 10208623 Queen Anne
3376 10210625 Capitol Hill
3377 10211609 First Hill
3378 10211716 First Hill
3379 10211928 Ravenna
3380 10231701 Ballard
3381 10235014 Belltown
3382 10235136 Meadowbrook
3383 10247453 Central Business District
3384 10248139 Minor
3385 10249527 Wallingford
3386 10250735 Capitol Hill
3387 10252110 Genesee
3388 10262971 Ballard
3389 10273158 Central Business District
3390 10279830 Minor
3391 10281965 Montlake
3392 10292753 Belltown
3393 10295151 Capitol Hill
3394 10310373 Queen Anne
3395 10318171 Stevens
3396 10319529 North Beach/Blue Ridge
3397 10332096 Olympic Hills
3398 10334184 Capitol Hill
3399 10339144 Capitol Hill
3400 10339145 Alki
3401 10340165 Greenwood

3402 rows × 2 columns

Joining Listings and Calendar data

In [34]:
seattle_data_neighbourhood_calendar = pd.merge(seattle_data_calendar_df, seattle_data_neighbourhood, left_on='listing_id', right_on="id", how='left')
In [35]:
seattle_data_neighbourhood_calendar
Out[35]:
listing_id date available price month id neighbourhood
0 241032 2016-01-04 t 85.0 1 241032.0 Queen Anne
1 241032 2016-01-05 t 85.0 1 241032.0 Queen Anne
2 241032 2016-01-06 f NaN 1 241032.0 Queen Anne
3 241032 2016-01-07 f NaN 1 241032.0 Queen Anne
4 241032 2016-01-08 f NaN 1 241032.0 Queen Anne
5 241032 2016-01-09 f NaN 1 241032.0 Queen Anne
6 241032 2016-01-10 f NaN 1 241032.0 Queen Anne
7 241032 2016-01-11 f NaN 1 241032.0 Queen Anne
8 241032 2016-01-12 f NaN 1 241032.0 Queen Anne
9 241032 2016-01-13 t 85.0 1 241032.0 Queen Anne
10 241032 2016-01-14 t 85.0 1 241032.0 Queen Anne
11 241032 2016-01-15 f NaN 1 241032.0 Queen Anne
12 241032 2016-01-16 f NaN 1 241032.0 Queen Anne
13 241032 2016-01-17 f NaN 1 241032.0 Queen Anne
14 241032 2016-01-18 t 85.0 1 241032.0 Queen Anne
15 241032 2016-01-19 t 85.0 1 241032.0 Queen Anne
16 241032 2016-01-20 t 85.0 1 241032.0 Queen Anne
17 241032 2016-01-21 f NaN 1 241032.0 Queen Anne
18 241032 2016-01-22 f NaN 1 241032.0 Queen Anne
19 241032 2016-01-23 f NaN 1 241032.0 Queen Anne
20 241032 2016-01-24 t 85.0 1 241032.0 Queen Anne
21 241032 2016-01-25 t 85.0 1 241032.0 Queen Anne
22 241032 2016-01-26 t 85.0 1 241032.0 Queen Anne
23 241032 2016-01-27 t 85.0 1 241032.0 Queen Anne
24 241032 2016-01-28 t 85.0 1 241032.0 Queen Anne
25 241032 2016-01-29 f NaN 1 241032.0 Queen Anne
26 241032 2016-01-30 f NaN 1 241032.0 Queen Anne
27 241032 2016-01-31 f NaN 1 241032.0 Queen Anne
28 241032 2016-02-01 t 85.0 2 241032.0 Queen Anne
29 241032 2016-02-02 t 85.0 2 241032.0 Queen Anne
... ... ... ... ... ... ... ...
1393540 10208623 2016-12-04 f NaN 12 10208623.0 Queen Anne
1393541 10208623 2016-12-05 f NaN 12 10208623.0 Queen Anne
1393542 10208623 2016-12-06 f NaN 12 10208623.0 Queen Anne
1393543 10208623 2016-12-07 f NaN 12 10208623.0 Queen Anne
1393544 10208623 2016-12-08 f NaN 12 10208623.0 Queen Anne
1393545 10208623 2016-12-09 f NaN 12 10208623.0 Queen Anne
1393546 10208623 2016-12-10 f NaN 12 10208623.0 Queen Anne
1393547 10208623 2016-12-11 f NaN 12 10208623.0 Queen Anne
1393548 10208623 2016-12-12 f NaN 12 10208623.0 Queen Anne
1393549 10208623 2016-12-13 f NaN 12 10208623.0 Queen Anne
1393550 10208623 2016-12-14 f NaN 12 10208623.0 Queen Anne
1393551 10208623 2016-12-15 f NaN 12 10208623.0 Queen Anne
1393552 10208623 2016-12-16 f NaN 12 10208623.0 Queen Anne
1393553 10208623 2016-12-17 f NaN 12 10208623.0 Queen Anne
1393554 10208623 2016-12-18 f NaN 12 10208623.0 Queen Anne
1393555 10208623 2016-12-19 f NaN 12 10208623.0 Queen Anne
1393556 10208623 2016-12-20 f NaN 12 10208623.0 Queen Anne
1393557 10208623 2016-12-21 f NaN 12 10208623.0 Queen Anne
1393558 10208623 2016-12-22 f NaN 12 10208623.0 Queen Anne
1393559 10208623 2016-12-23 f NaN 12 10208623.0 Queen Anne
1393560 10208623 2016-12-24 f NaN 12 10208623.0 Queen Anne
1393561 10208623 2016-12-25 f NaN 12 10208623.0 Queen Anne
1393562 10208623 2016-12-26 f NaN 12 10208623.0 Queen Anne
1393563 10208623 2016-12-27 f NaN 12 10208623.0 Queen Anne
1393564 10208623 2016-12-28 f NaN 12 10208623.0 Queen Anne
1393565 10208623 2016-12-29 f NaN 12 10208623.0 Queen Anne
1393566 10208623 2016-12-30 f NaN 12 10208623.0 Queen Anne
1393567 10208623 2016-12-31 f NaN 12 10208623.0 Queen Anne
1393568 10208623 2017-01-01 f NaN 1 10208623.0 Queen Anne
1393569 10208623 2017-01-02 f NaN 1 10208623.0 Queen Anne

1393570 rows × 7 columns

In [36]:
seattle_data_neighbourhood_calendar_group_results_df = seattle_data_neighbourhood_calendar.loc[:,["neighbourhood","price"]].groupby(["neighbourhood"]).sum()
In [37]:
seattle_data_neighbourhood_calendar_group_results_df = seattle_data_neighbourhood_calendar_group_results_df.reset_index()

Analysis

In [38]:
seattle_data_neighbourhood_sorted_df = seattle_data_neighbourhood_calendar_group_results_df.nlargest(10,'price')
seattle_data_neighbourhood_sorted_df = seattle_data_neighbourhood_sorted_df.reset_index(drop=True)
seattle_data_neighbourhood_sorted_df
Out[38]:
neighbourhood price
0 Capitol Hill 10753516.0
1 Belltown 9545080.0
2 Queen Anne 8249113.0
3 Ballard 6744518.0
4 Minor 6649223.0
5 Fremont 5264507.0
6 Wallingford 4773675.0
7 Central Business District 4501319.0
8 First Hill 3551078.0
9 Magnolia 3156841.0
In [39]:
plt.figure(figsize=(8,6))
g = sns.barplot(x="price", y="neighbourhood", data=seattle_data_neighbourhood_sorted_df)
plt.xlabel("Price")
plt.ylabel("Neighbourhood")
plt.title("Top 10 Neighbourhood by Listings Revenue")
plt.show()

Observation:

Capitoal hill neighbourhood seems to make the most revenue. Either it has more listings available or it has listings which are priced higher

Data Understanding

Understanding the data type of each column in listings : Getting a list of Numeric Columns

In [40]:
## Finding the data type of Variables
verbose = True

numeric_cols = []
numeric_cols_with_id = []
for col in seattle_data_listings_df.columns:
    s = seattle_data_listings_df[col]
    
    if col in {'scrape_id', 'host_id', 'latitude', 'longitude'}:
        continue
        
    if s.dtype == object or s.dtype.name == 'category': # or 'Hour and date' in col:
        col_type = 'Category'
        #sdf = pd.get_dummies(s)
        #x_col = sp.csr_matrix(sdf)
        #X.append(x_col)
        #names.extend(['{} = {}'.format(col, c) for c in sdf.columns])
    else:
        col_type = 'Numeric'
        if col == 'id':
            numeric_cols_with_id.append(col)
        else:
            numeric_cols_with_id.append(col)
            numeric_cols.append(col)
        #x_col = s.astype(float).fillna(0.0).values.reshape(-1, 1)
        #x_col = sp.csr_matrix(MinMaxScaler().fit_transform(x_col))
        #X.append(x_col)
        #names.append(col.strip())
        
    if verbose:
        print('* {} - {}'.format(col.strip(), col_type))
        
print('* Numeric columns {}'.format(numeric_cols))
* id - Numeric
* listing_url - Category
* name - Category
* summary - Category
* space - Category
* description - Category
* neighborhood_overview - Category
* notes - Category
* transit - Category
* thumbnail_url - Category
* medium_url - Category
* picture_url - Category
* xl_picture_url - Category
* host_url - Category
* host_name - Category
* host_since - Category
* host_location - Category
* host_about - Category
* host_response_time - Category
* host_response_rate - Numeric
* host_acceptance_rate - Category
* host_is_superhost - Category
* host_thumbnail_url - Category
* host_picture_url - Category
* host_neighbourhood - Category
* host_listings_count - Numeric
* host_total_listings_count - Numeric
* host_verifications - Category
* host_has_profile_pic - Category
* host_identity_verified - Category
* street - Category
* neighbourhood - Category
* neighbourhood_cleansed - Category
* neighbourhood_group_cleansed - Category
* city - Category
* state - Category
* zipcode - Category
* smart_location - Category
* is_location_exact - Category
* property_type - Category
* room_type - Category
* accommodates - Numeric
* bathrooms - Numeric
* bedrooms - Numeric
* beds - Numeric
* bed_type - Category
* amenities - Category
* square_feet - Numeric
* price - Numeric
* weekly_price - Category
* monthly_price - Category
* security_deposit - Category
* cleaning_fee - Category
* guests_included - Numeric
* extra_people - Category
* minimum_nights - Numeric
* maximum_nights - Numeric
* calendar_updated - Category
* availability_30 - Numeric
* availability_60 - Numeric
* availability_90 - Numeric
* availability_365 - Numeric
* number_of_reviews - Numeric
* first_review - Category
* last_review - Category
* review_scores_rating - Numeric
* review_scores_accuracy - Numeric
* review_scores_cleanliness - Numeric
* review_scores_checkin - Numeric
* review_scores_communication - Numeric
* review_scores_location - Numeric
* review_scores_value - Numeric
* instant_bookable - Category
* cancellation_policy - Category
* require_guest_profile_picture - Category
* require_guest_phone_verification - Category
* calculated_host_listings_count - Numeric
* reviews_per_month - Numeric
* host_since_in_years - Numeric
* Numeric columns ['host_response_rate', 'host_listings_count', 'host_total_listings_count', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 'square_feet', 'price', 'guests_included', 'minimum_nights', 'maximum_nights', 'availability_30', 'availability_60', 'availability_90', 'availability_365', 'number_of_reviews', 'review_scores_rating', 'review_scores_accuracy', 'review_scores_cleanliness', 'review_scores_checkin', 'review_scores_communication', 'review_scores_location', 'review_scores_value', 'calculated_host_listings_count', 'reviews_per_month', 'host_since_in_years']

Approach : Extracting the numeric columns, I check whether there is any linear relationship between the numeric variables and price by running a correlation plot.

Analysis : Correlation between numeric variables and price

In [41]:
fig_size = (20, 20)
fig = sns.clustermap(seattle_data_listings_df.loc[:,numeric_cols].corr().fillna(0.0), annot=True, figsize=fig_size,cmap="YlGnBu")
plt.setp(fig.ax_heatmap.get_yticklabels(), rotation=0)
plt.show()

Observation

From the hierarchical correlation plot, above one could observe that there are blocks of sections which are correlated within each other. Let us take the first block which has the price field. We could observe that price is correlated with bathrooms, bedrooms, accomodates (number of people it accomodates), beds, guests included and square feet. Negative correlation between reviews per month and price indicating that high priced properties have fewer reviews.

Business Question 4 : Does location / neighbourhood have an effect on price? Are we seeing any pattern of listings / price on the location map

Approach: From the percentiles obtained for price, we divide the price into 3 ranges - low, medium and high. Any value below 25th percentile is low and between 25th and 75th percentile is medium and above 75th percentile is high. We then plot the listings and their price ranges on the map based on longitude and latitude. The idea is to observe are there any locations which have more low price listings?. Are there locations which have high price and medium price listings?

In [42]:
def get_class_label(price):
    """
    Return a class label based on price ranges.
    price: listings price value
    returns : class label
    """
    if price <75:
        return 1
    elif price >=75 and price <150:
        return 2
    else:
        return 3
In [43]:
   
seattle_data_listings_df['price_label'] = seattle_data_listings_df['price'].apply(lambda x : get_class_label(x))

lat_long_df = pd.DataFrame()

lat_long_df['latitude'] = seattle_data_listings_df['latitude']
lat_long_df['longitude'] = seattle_data_listings_df['longitude']
lat_long_df['cluster'] = seattle_data_listings_df['price_label']
C:\Program Files\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
In [44]:
lat_long_df
Out[44]:
latitude longitude cluster
0 47.636289 -122.371025 2
1 47.639123 -122.365666 3
2 47.629724 -122.369483 3
3 47.638473 -122.369279 2
4 47.632918 -122.372471 3
5 47.630525 -122.366174 2
6 47.636605 -122.368519 2
7 47.640161 -122.375856 1
8 47.632410 -122.357216 2
9 47.637492 -122.366889 3
10 47.635482 -122.358478 2
11 47.637214 -122.360046 2
12 47.635546 -122.373171 3
13 47.629507 -122.367629 3
14 47.639203 -122.365863 3
15 47.635650 -122.372893 2
16 47.626200 -122.366602 1
17 47.640646 -122.372406 2
18 47.639776 -122.372235 3
19 47.636025 -122.358694 3
20 47.627940 -122.364959 2
21 47.636374 -122.361033 2
22 47.635632 -122.358881 3
23 47.633394 -122.371920 2
24 47.638517 -122.369581 3
25 47.631397 -122.367767 3
26 47.638752 -122.367973 3
27 47.639266 -122.374726 1
28 47.629907 -122.368948 3
29 47.639816 -122.374338 2
... ... ... ...
3788 47.656739 -122.344865 2
3789 47.661171 -122.349923 1
3790 47.649271 -122.347817 3
3791 47.659149 -122.363466 2
3792 47.663040 -122.348499 2
3793 47.660056 -122.357683 2
3794 47.654199 -122.359987 2
3795 47.650124 -122.343283 3
3796 47.656508 -122.360571 2
3797 47.658976 -122.354915 1
3798 47.655381 -122.343205 2
3799 47.652902 -122.352110 1
3800 47.663613 -122.349519 3
3801 47.657078 -122.358795 2
3802 47.650765 -122.347787 3
3803 47.661131 -122.349238 3
3804 47.660856 -122.352401 2
3805 47.653158 -122.356285 2
3806 47.656057 -122.354514 3
3807 47.654304 -122.342720 2
3808 47.662036 -122.350485 3
3809 47.648689 -122.343915 2
3810 47.654205 -122.352604 3
3811 47.657898 -122.346692 1
3812 47.654516 -122.358124 2
3813 47.664295 -122.359170 3
3814 47.649552 -122.318309 2
3815 47.508453 -122.240607 2
3816 47.632335 -122.275530 2
3817 47.641186 -122.342085 2

3818 rows × 3 columns

In [45]:
m = folium.Map(location=[47.732647,-122.341301 
], zoom_start=7)
In [46]:
colors = [
    'pink',
    'blue',
    'green',
    'orange', 
    'black',
    'orange',
    'beige',
    'green',
    'darkgreen',
    'lightgreen',
    'darkblue',
    'lightblue',
    'purple',
    'darkpurple',
    'darkred',
    'cadetblue',
    'gray',
    'lightred'
]

def get_popup_text(label):
    """
    returns meaningful description of popup
    label : price label value
    """
    if label==1:
        return 'low'
    elif label==2:
        return 'medium'
    else:
        return 'high'

for i in range(0,lat_long_df.shape[0]):
    #print(i)
    folium.Marker( [lat_long_df.iloc[i]['latitude'],lat_long_df.iloc[i]['longitude']], popup=get_popup_text(int(lat_long_df.iloc[i]['cluster'])),icon=folium.Icon(color=colors[int(lat_long_df.iloc[i]['cluster'])])).add_to(m)
    
legend_html = '''
     <div style="position: fixed; bottom: 50px; left: 50px; width: 100px; height: 90px; border:2px solid grey; z-index:9999; font-size:14px;">&nbsp; Low &nbsp; <i class="fa fa-map-marker fa-2x" style="color:#5DADE2"></i><br>&nbsp; Medium &nbsp; <i class="fa fa-map-marker fa-2x" style=”color:#64C714”></i><br>&nbsp; High Price <i class="fa fa-map-marker fa-2x" style="color:#D68910"></i></div>'''

m.get_root().html.add_child(folium.Element(legend_html))
Out[46]:
<branca.element.Element at 0x1b1a689e748>
In [47]:
m
Out[47]:

In [48]:
m.save('seattle_folium_map.html')

Observation: Studying the map, one could observe prevalance of low/medium/high listings. For instance, around Univeristy of Washington there are many low and medium priced listings. Around capitol hill area, there are more prevalance of medium and high price listings

Business Question 5 : Are we able to predict price ranges (low/medium/high) based on property, host and review information?

Approach : In the earlier section of code, i considered only the numeric variables to understand their effect on price. Now combining the categorical variables like the property, host and review information - I check whether we are able to predict the price range of a property as low / medium and high? Instead of predicting price with the small volume of data, I've turned it into a classification problem where I'm predicting price ranges(low/medium/high). I choose a machine learning algorithm - Random Forests Classifier. This tree based algorithm, could handle combination of categorical and numeric data to help predict the price ranges. The data is split into 80% train and 20% test sets. 5 fold cross validation is carried out on the training data for different number of estimators. The optimal hyperparameter is chosen and the algorithm is trained with the optimal parameter. It is then tested on 20% data. Accuracy metric is used for evaluation of the algorithm.

In [49]:
data_df = seattle_data_listings_df.loc[:,['property_type', 'room_type', 'accommodates', 'host_response_time', 'host_response_rate', 'bathrooms', 'bedrooms', 'beds', 'bed_type', 'host_listings_count', 'guests_included', 'number_of_reviews','minimum_nights','maximum_nights', 'review_scores_rating', 'review_scores_accuracy', 'review_scores_cleanliness','review_scores_checkin', 'review_scores_communication','review_scores_location', 'review_scores_value','reviews_per_month','availability_365','host_since_in_years','price_label']]
In [50]:
# data_df.loc[data_df['host_response_time'].isnull(),'host_response_time']='missing'
# data_df.loc[data_df['host_response_rate'].isnull(),'host_response_rate']=0
# data_df.loc[data_df['review_scores_rating'].isnull(),'review_scores_rating']=0
# data_df.loc[data_df['review_scores_accuracy'].isnull(),'review_scores_accuracy']=0
# data_df.loc[data_df['review_scores_cleanliness'].isnull(),'review_scores_cleanliness']=0
# data_df.loc[data_df['review_scores_checkin'].isnull(),'review_scores_checkin']=0
# data_df.loc[data_df['review_scores_communication'].isnull(),'review_scores_communication']=0
# data_df.loc[data_df['review_scores_location'].isnull(),'review_scores_location']=0
# data_df.loc[data_df['review_scores_value'].isnull(),'review_scores_value']=0
# data_df.loc[data_df['reviews_per_month'].isnull(),'reviews_per_month']=0
In [51]:
#data_df.isnull().sum()

Handle Categorical Data

In [52]:
data_df = pd.get_dummies(data_df)
In [53]:
data_df
Out[53]:
accommodates host_response_rate bathrooms bedrooms beds host_listings_count guests_included number_of_reviews minimum_nights maximum_nights ... room_type_Shared room host_response_time_a few days or more host_response_time_within a day host_response_time_within a few hours host_response_time_within an hour bed_type_Airbed bed_type_Couch bed_type_Futon bed_type_Pull-out Sofa bed_type_Real Bed
0 4 96.0 1.0 1.0 1.0 3.0 2 207 1 365 ... 0 0 0 1 0 0 0 0 0 1
1 4 98.0 1.0 1.0 1.0 6.0 1 43 2 90 ... 0 0 0 0 1 0 0 0 0 1
2 11 67.0 4.5 5.0 7.0 2.0 10 20 4 30 ... 0 0 0 1 0 0 0 0 0 1
3 3 NaN 1.0 0.0 2.0 1.0 1 0 1 1125 ... 0 0 0 0 0 0 0 0 0 1
4 6 100.0 2.0 3.0 3.0 2.0 6 38 1 1125 ... 0 0 0 0 1 0 0 0 0 1
5 2 NaN 1.0 1.0 1.0 1.0 1 17 1 6 ... 0 0 0 0 0 0 0 0 0 1
6 2 100.0 1.0 1.0 1.0 1.0 1 58 3 14 ... 0 0 0 0 1 0 0 0 0 1
7 2 100.0 1.0 1.0 1.0 5.0 1 173 2 7 ... 0 0 0 0 1 0 0 0 0 1
8 2 NaN 1.0 1.0 1.0 1.0 1 8 3 1125 ... 0 0 0 0 0 0 0 0 0 1
9 4 98.0 1.0 1.0 1.0 6.0 1 32 2 365 ... 0 0 0 0 1 0 0 0 0 1
10 2 100.0 1.0 1.0 1.0 1.0 2 181 3 7 ... 0 0 0 1 0 0 0 0 0 1
11 2 100.0 1.0 1.0 1.0 1.0 1 8 3 1125 ... 0 0 0 1 0 0 0 0 0 1
12 4 100.0 1.0 2.0 3.0 1.0 1 13 3 14 ... 0 0 0 0 1 0 0 0 0 1
13 5 100.0 1.0 2.0 3.0 1.0 4 3 2 1125 ... 0 0 0 0 1 0 0 0 0 1
14 16 100.0 3.5 6.0 15.0 1.0 8 18 3 29 ... 0 0 0 0 1 0 0 0 0 1
15 5 100.0 1.0 2.0 2.0 1.0 1 1 3 15 ... 0 0 0 1 0 0 0 0 0 1
16 2 100.0 1.0 1.0 1.0 1.0 1 84 1 10 ... 0 0 0 0 1 0 0 0 0 1
17 2 100.0 1.0 1.0 1.0 1.0 1 45 2 14 ... 0 0 0 0 1 0 0 0 0 1
18 2 NaN 2.0 1.0 1.0 1.0 1 0 1 1125 ... 0 0 0 0 0 0 0 1 0 0
19 10 NaN 3.5 5.0 5.0 1.0 1 5 3 30 ... 0 0 0 0 0 0 0 0 0 1
20 1 100.0 1.0 1.0 1.0 1.0 1 1 1 1125 ... 0 0 0 1 0 0 0 0 0 1
21 3 100.0 1.5 1.0 2.0 1.0 1 11 1 1125 ... 0 0 0 0 1 0 0 0 0 1
22 8 100.0 2.0 4.0 4.0 1.0 1 1 4 1125 ... 0 0 1 0 0 0 0 0 0 1
23 2 NaN 1.0 1.0 1.0 1.0 1 0 1 1125 ... 0 0 0 0 0 0 0 0 0 1
24 5 NaN 2.5 3.0 3.0 1.0 1 1 3 1125 ... 0 0 0 0 0 0 0 0 0 1
25 8 67.0 2.5 3.0 5.0 2.0 6 8 4 1125 ... 0 0 0 1 0 0 0 0 0 1
26 5 100.0 1.0 2.0 3.0 1.0 1 0 1 1125 ... 0 0 0 1 0 0 0 0 0 1
27 2 100.0 1.0 1.0 1.0 5.0 1 80 2 7 ... 0 0 0 0 1 0 0 0 0 1
28 6 100.0 2.0 3.0 3.0 5.0 1 6 2 1125 ... 0 0 0 0 1 0 0 0 0 1
29 3 71.0 1.0 1.0 2.0 1.0 3 4 2 1125 ... 0 0 0 0 1 0 0 0 0 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3788 4 100.0 1.0 1.0 2.0 1.0 2 14 1 1125 ... 0 0 0 0 1 0 0 0 0 1
3789 2 100.0 1.0 1.0 1.0 2.0 2 23 1 14 ... 0 0 0 1 0 0 0 0 0 1
3790 6 100.0 1.0 2.0 3.0 1.0 4 75 2 29 ... 0 0 0 0 1 0 0 0 0 1
3791 2 100.0 1.0 1.0 1.0 1.0 1 11 2 30 ... 0 0 0 1 0 0 0 0 0 1
3792 4 100.0 1.0 1.0 1.0 1.0 1 0 2 1125 ... 0 0 0 0 1 0 0 0 0 1
3793 4 100.0 1.0 0.0 2.0 1.0 4 78 2 180 ... 0 0 0 0 1 0 0 0 0 1
3794 2 80.0 1.0 1.0 1.0 1.0 2 21 2 1125 ... 0 0 0 1 0 0 0 0 0 1
3795 4 100.0 1.0 0.0 1.0 1.0 1 10 2 1125 ... 0 0 0 0 1 0 0 0 0 1
3796 2 100.0 1.0 1.0 1.0 1.0 1 233 2 30 ... 0 0 0 0 1 0 0 0 0 1
3797 2 100.0 1.0 0.0 1.0 1.0 2 1 3 1125 ... 0 0 0 0 1 0 0 1 0 0
3798 2 100.0 1.0 1.0 1.0 2.0 2 128 2 14 ... 0 0 0 0 1 0 0 0 0 1
3799 2 25.0 1.0 1.0 1.0 1.0 1 5 2 1125 ... 0 1 0 0 0 0 0 0 0 1
3800 2 NaN 1.5 1.0 1.0 1.0 1 0 1 1125 ... 0 0 0 0 0 0 0 0 0 1
3801 2 100.0 1.0 1.0 1.0 1.0 2 28 4 1125 ... 0 0 0 1 0 0 0 0 0 1
3802 4 NaN 1.0 2.0 2.0 1.0 1 0 2 1125 ... 0 0 0 0 0 0 0 0 0 1
3803 16 97.0 3.5 4.0 10.0 9.0 12 4 1 1125 ... 0 0 0 1 0 0 0 0 0 1
3804 2 100.0 1.0 0.0 1.0 1.0 1 12 2 1125 ... 0 0 0 1 0 0 0 0 0 1
3805 6 100.0 1.0 2.0 2.0 1.0 4 10 2 30 ... 0 0 0 0 1 0 0 0 0 1
3806 4 100.0 1.0 2.0 2.0 2.0 1 1 1 1125 ... 0 0 0 0 1 0 0 0 0 1
3807 2 100.0 1.0 0.0 1.0 1.0 2 7 1 360 ... 0 0 1 0 0 0 0 0 0 1
3808 6 100.0 2.0 3.0 3.0 2.0 4 29 3 1125 ... 0 0 0 1 0 0 0 0 0 1
3809 4 100.0 1.0 1.0 1.0 1.0 2 10 4 30 ... 0 0 1 0 0 0 0 0 0 1
3810 5 100.0 1.0 2.0 3.0 1.0 4 5 2 14 ... 0 0 0 0 1 0 0 0 0 1
3811 3 100.0 1.0 1.0 1.0 1.0 1 2 1 1125 ... 0 0 0 1 0 0 0 0 0 1
3812 4 100.0 1.0 1.0 2.0 3.0 2 73 3 365 ... 0 0 0 0 1 0 0 0 0 1
3813 6 99.0 2.0 3.0 3.0 354.0 1 1 3 1125 ... 0 0 0 1 0 0 0 0 0 1
3814 4 100.0 1.0 1.0 2.0 1.0 3 2 2 29 ... 0 0 0 0 1 0 0 0 0 1
3815 2 NaN 1.0 1.0 1.0 1.0 2 0 1 7 ... 0 0 0 0 0 0 0 0 0 1
3816 2 100.0 1.0 0.0 1.0 1.0 1 0 3 1125 ... 0 0 0 0 1 0 0 0 0 1
3817 3 100.0 1.5 2.0 1.0 1.0 1 0 1 1125 ... 0 0 1 0 0 0 0 0 0 1

3818 rows × 49 columns

In [54]:
columns_minus_price = list(set(data_df.columns)-set(['price','price_label']))
In [55]:
columns_minus_price
Out[55]:
['property_type_Dorm',
 'property_type_Treehouse',
 'room_type_Shared room',
 'property_type_House',
 'host_response_rate',
 'beds',
 'bedrooms',
 'property_type_Condominium',
 'bed_type_Futon',
 'bathrooms',
 'bed_type_Couch',
 'property_type_Cabin',
 'host_since_in_years',
 'property_type_Townhouse',
 'reviews_per_month',
 'host_listings_count',
 'host_response_time_within a day',
 'review_scores_checkin',
 'review_scores_communication',
 'host_response_time_within an hour',
 'review_scores_rating',
 'property_type_Other',
 'bed_type_Airbed',
 'property_type_Camper/RV',
 'review_scores_cleanliness',
 'property_type_Apartment',
 'host_response_time_a few days or more',
 'property_type_Chalet',
 'accommodates',
 'review_scores_location',
 'number_of_reviews',
 'review_scores_accuracy',
 'room_type_Private room',
 'host_response_time_within a few hours',
 'room_type_Entire home/apt',
 'bed_type_Pull-out Sofa',
 'maximum_nights',
 'availability_365',
 'property_type_Loft',
 'guests_included',
 'property_type_Boat',
 'minimum_nights',
 'review_scores_value',
 'property_type_Yurt',
 'property_type_Bed & Breakfast',
 'property_type_Tent',
 'property_type_Bungalow',
 'bed_type_Real Bed']

Dropping nulls

In [56]:
data_df = data_df.dropna()
In [57]:
data_df.shape
Out[57]:
(2834, 49)

Classification algorithm to detect price ranges

In [58]:
best_params = {}
parameters = {'n_estimators': [25, 50, 100, 150, 200, 250]}

X_train, X_test, y_train, y_test = train_test_split(data_df.loc[:,columns_minus_price], data_df['price_label'],
                                                    train_size=0.8,
                                                    random_state=0)

rf_clf = RandomForestClassifier(class_weight='balanced', random_state=0, max_depth=5)
grid_search = GridSearchCV(rf_clf, parameters, scoring='f1_macro', cv=5, verbose=10)
grid_search.fit(X_train, y_train)
best_params = grid_search.best_params_
C:\Program Files\Anaconda3\lib\site-packages\sklearn\model_selection\_split.py:2026: FutureWarning: From version 0.21, test_size will always complement train_size unless both are specified.
  FutureWarning)
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.1s remaining:    0.0s
Fitting 5 folds for each of 6 candidates, totalling 30 fits
[CV] n_estimators=25 .................................................
[CV] ........ n_estimators=25, score=0.6959448160535118, total=   0.0s
[CV] n_estimators=25 .................................................
[CV] ........ n_estimators=25, score=0.7005026615124307, total=   0.0s
[CV] n_estimators=25 .................................................
[CV] ........ n_estimators=25, score=0.6986157875046765, total=   0.0s
[CV] n_estimators=25 .................................................
[CV] ........ n_estimators=25, score=0.6815010209867816, total=   0.0s
[CV] n_estimators=25 .................................................
[CV] ........ n_estimators=25, score=0.6928470700852172, total=   0.0s
[CV] n_estimators=50 .................................................
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.1s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   5 out of   5 | elapsed:    0.2s remaining:    0.0s
[CV] ........ n_estimators=50, score=0.7004459308807135, total=   0.0s
[CV] n_estimators=50 .................................................
[CV] ........ n_estimators=50, score=0.6982357874166412, total=   0.0s
[CV] n_estimators=50 .................................................
[Parallel(n_jobs=1)]: Done   6 out of   6 | elapsed:    0.4s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   7 out of   7 | elapsed:    0.5s remaining:    0.0s
[CV] ........ n_estimators=50, score=0.6986044816091365, total=   0.0s
[CV] n_estimators=50 .................................................
[CV] ........ n_estimators=50, score=0.6926764251773384, total=   0.0s
[CV] n_estimators=50 .................................................
[Parallel(n_jobs=1)]: Done   8 out of   8 | elapsed:    0.6s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   9 out of   9 | elapsed:    0.7s remaining:    0.0s
[CV] ......... n_estimators=50, score=0.692474313291538, total=   0.0s
[CV] n_estimators=100 ................................................
[CV] ....... n_estimators=100, score=0.6943422519509476, total=   0.1s
[CV] n_estimators=100 ................................................
[CV] ....... n_estimators=100, score=0.7020540127813056, total=   0.1s
[CV] n_estimators=100 ................................................
[CV] ....... n_estimators=100, score=0.7006447513182102, total=   0.1s
[CV] n_estimators=100 ................................................
[CV] ........ n_estimators=100, score=0.686453634085213, total=   0.1s
[CV] n_estimators=100 ................................................
[CV] ........ n_estimators=100, score=0.693909845180053, total=   0.1s
[CV] n_estimators=150 ................................................
[CV] ....... n_estimators=150, score=0.6987974865511782, total=   0.2s
[CV] n_estimators=150 ................................................
[CV] ....... n_estimators=150, score=0.7020540127813056, total=   0.2s
[CV] n_estimators=150 ................................................
[CV] ....... n_estimators=150, score=0.7047337060448386, total=   0.2s
[CV] n_estimators=150 ................................................
[CV] ....... n_estimators=150, score=0.6840736530753945, total=   0.2s
[CV] n_estimators=150 ................................................
[CV] ....... n_estimators=150, score=0.6913389735230516, total=   0.2s
[CV] n_estimators=200 ................................................
[CV] ....... n_estimators=200, score=0.7049470457079153, total=   0.2s
[CV] n_estimators=200 ................................................
[CV] ....... n_estimators=200, score=0.6997351201473515, total=   0.3s
[CV] n_estimators=200 ................................................
[CV] ....... n_estimators=200, score=0.7067826803026037, total=   0.3s
[CV] n_estimators=200 ................................................
[CV] ....... n_estimators=200, score=0.6840736530753945, total=   0.2s
[CV] n_estimators=200 ................................................
[CV] ....... n_estimators=200, score=0.6913389735230516, total=   0.3s
[CV] n_estimators=250 ................................................
[CV] ....... n_estimators=250, score=0.7028938263878022, total=   0.3s
[CV] n_estimators=250 ................................................
[CV] ....... n_estimators=250, score=0.6955486541726565, total=   0.4s
[CV] n_estimators=250 ................................................
[CV] ......... n_estimators=250, score=0.70883484772118, total=   0.3s
[CV] n_estimators=250 ................................................
[CV] ........ n_estimators=250, score=0.684383850358954, total=   0.3s
[CV] n_estimators=250 ................................................
[CV] ....... n_estimators=250, score=0.6913389735230516, total=   0.3s
[Parallel(n_jobs=1)]: Done  30 out of  30 | elapsed:    7.8s finished
In [59]:
best_params
Out[59]:
{'n_estimators': 200}
In [60]:
best_model = RandomForestClassifier(n_estimators=best_params['n_estimators'], random_state=0, class_weight="balanced")
#best_model = RandomForestClassifier(n_estimators=best_params['n_estimators'])
best_model.fit(X_train, y_train)
Out[60]:
RandomForestClassifier(bootstrap=True, class_weight='balanced',
            criterion='gini', max_depth=None, max_features='auto',
            max_leaf_nodes=None, min_impurity_decrease=0.0,
            min_impurity_split=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            n_estimators=200, n_jobs=1, oob_score=False, random_state=0,
            verbose=0, warm_start=False)
In [61]:
# Evaluate the model using the full dataset# Evalua 
from sklearn.metrics import accuracy_score

p = best_model.predict(X_train)
print("Training Accuracy", accuracy_score(y_train, p))   # sanity check
Training Accuracy 1.0
In [62]:
# Evaluate the model using the full dataset
p = best_model.predict(X_test)
print("Test Accuracy", accuracy_score(y_test, p))   # sanity check
Test Accuracy 0.7707231040564374
In [63]:
indices = np.argsort(best_model.feature_importances_)[::-1]
print(indices)
print([columns_minus_price[i] for i in indices])
[34 14  6 28 32 30 37  5 20  9 15 39 12 36 41  4 29  3 42 25 24 31 19  2
 33 17 16 18 47 38 44  7  8 13 35 26 21 11 10 22 23 46  0 40 45 27 43  1]
['room_type_Entire home/apt', 'reviews_per_month', 'bedrooms', 'accommodates', 'room_type_Private room', 'number_of_reviews', 'availability_365', 'beds', 'review_scores_rating', 'bathrooms', 'host_listings_count', 'guests_included', 'host_since_in_years', 'maximum_nights', 'minimum_nights', 'host_response_rate', 'review_scores_location', 'property_type_House', 'review_scores_value', 'property_type_Apartment', 'review_scores_cleanliness', 'review_scores_accuracy', 'host_response_time_within an hour', 'room_type_Shared room', 'host_response_time_within a few hours', 'review_scores_checkin', 'host_response_time_within a day', 'review_scores_communication', 'bed_type_Real Bed', 'property_type_Loft', 'property_type_Bed & Breakfast', 'property_type_Condominium', 'bed_type_Futon', 'property_type_Townhouse', 'bed_type_Pull-out Sofa', 'host_response_time_a few days or more', 'property_type_Other', 'property_type_Cabin', 'bed_type_Couch', 'bed_type_Airbed', 'property_type_Camper/RV', 'property_type_Bungalow', 'property_type_Dorm', 'property_type_Boat', 'property_type_Tent', 'property_type_Chalet', 'property_type_Yurt', 'property_type_Treehouse']
In [64]:
columns_n_importances = [(columns_minus_price[i], best_model.feature_importances_[i]) for i in indices]
In [65]:
plt.figure(figsize=(8,10))
g = sns.barplot(x=[i[1] for i in columns_n_importances[:20]], y=[i[0] for i in columns_n_importances[:20]], data=seattle_data_calendar_group_results_df)
plt.xlabel("Feature Importance")
plt.ylabel("Fields")
plt.title("Top 20 features to determine listing price ranges (low/medium/high)")
plt.show()

Observation:

Based on our experiment on using a classification algorithm for detecting the price ranges, one could observe that the test set has an accuracy ~75%. The features are ranked based on their importance and the visualisation shows the top 20 variables which have an impact on determining the price range. We could observe that room type entire home/appt, reviews per month, number of bedrooms and availability all have an impact on determining the price ranges.

Conclusion and Remarks: Either with more data and better algorithm one could improve on the classification results of predicting price ranges. Data has room for further analysis - checking how temporal information has an impact on price etc.